20 research outputs found
Guarantees on learning depth-2 neural networks under a data-poisoning attack
In recent times many state-of-the-art machine learning models have been shown
to be fragile to adversarial attacks. In this work we attempt to build our
theoretical understanding of adversarially robust learning with neural nets. We
demonstrate a specific class of neural networks of finite size and a
non-gradient stochastic algorithm which tries to recover the weights of the net
generating the realizable true labels in the presence of an oracle doing a
bounded amount of malicious additive distortion to the labels. We prove (nearly
optimal) trade-offs among the magnitude of the adversarial attack, the accuracy
and the confidence achieved by the proposed algorithm.Comment: 11 page
Size Lowerbounds for Deep Operator Networks
Deep Operator Networks are an increasingly popular paradigm for solving
regression in infinite dimensions and hence solve families of PDEs in one shot.
In this work, we aim to establish a first-of-its-kind data-dependent lowerbound
on the size of DeepONets required for them to be able to reduce empirical error
on noisy data. In particular, we show that for low training errors to be
obtained on data points it is necessary that the common output dimension of
the branch and the trunk net be scaling as . This inspires our experiments with DeepONets solving the
advection-diffusion-reaction PDE, where we demonstrate the possibility that at
a fixed model size, to leverage increase in this common output dimension and
get monotonic lowering of training error, the size of the training data might
necessarily need to scale quadratically with it.Comment: 21 pages, 3 figure
A Study of Neural Training with Iterative Non-Gradient Methods
In this work, we demonstrate provable guarantees on the training of a single
ReLU gate in hitherto unexplored regimes. We give a simple iterative stochastic
algorithm that can train a ReLU gate in the realizable setting in linear time
while using significantly milder conditions on the data distribution than
previous such results.
Leveraging certain additional moment assumptions, we also show a
first-of-its-kind approximate recovery of the true label generating parameters
under an (online) data-poisoning attack on the true labels, while training a
ReLU gate by the same algorithm. Our guarantee is shown to be nearly optimal in
the worst case and its accuracy of recovering the true weight degrades
gracefully with increasing probability of attack and its magnitude.
For both the realizable and the non-realizable cases as outlined above, our
analysis allows for mini-batching and computes how the convergence time scales
with the mini-batch size. We corroborate our theorems with simulation results
which also bring to light a striking similarity in trajectories between our
algorithm and the popular S.G.D. algorithm - for which similar guarantees as
here are still unknown.Comment: 25 pages, 5 figures, Accepted to be published in the journal, "Neural
Networks
Global Convergence of SGD On Two Layer Neural Nets
In this note we demonstrate provable convergence of SGD to the global minima
of appropriately regularized empirical risk of depth nets -- for
arbitrary data and with any number of gates, if they are using adequately
smooth and bounded activations like sigmoid and tanh. We build on the results
in [1] and leverage a constant amount of Frobenius norm regularization on the
weights, along with sampling of the initial weights from an appropriate
distribution. We also give a continuous time SGD convergence result that also
applies to smooth unbounded activations like SoftPlus. Our key idea is to show
the existence loss functions on constant sized neural nets which are "Villani
Functions". [1] Bin Shi, Weijie J. Su, and Michael I. Jordan. On learning rates
and schr\"odinger operators, 2020. arXiv:2004.06977Comment: 23 pages, 6 figures. Extended abstract accepted at DeepMath 2022. v2
update: New experiments added in Section 3.2 to study the effect of the
regularization value. Statement of Theorem 3.4 about SoftPlus nets has been
improve
A Study Of The Mathematics Of Deep Learning
"Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks. This ongoing revolution can be said to have been ignited by the iconic 2012 paper from the University of Toronto titled ``ImageNet Classification with Deep Convolutional Neural Networks'' by Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton. This paper showed that deep nets can be used to classify images into meaningful categories with almost human-like accuracies! As of 2020 this approach continues to produce unprecedented performance for an ever widening variety of novel purposes ranging from playing chess to self-driving cars to experimental astrophysics and high-energy physics. But this new found astonishing success of deep neural nets in the last few years has been hinged on an enormous amount of heuristics and it has turned out to be extremely challenging to be mathematically rigorously explainable. In this thesis we take several steps towards building strong theoretical foundations for these new paradigms of deep-learning.
Our proofs here can be broadly grouped into three categories,
1.
Understanding Neural Function Spaces
We show new circuit complexity theorems for deep neural functions over real and Boolean inputs and prove classification theorems about these function spaces which in turn lead to exact algorithms for empirical risk minimization for depth 2 ReLU nets.
We also motivate a measure of complexity of neural functions and leverage techniques from polytope geometry to constructively establish the existence of high-complexity neural functions.
2.
Understanding Deep Learning Algorithms
We give fast iterative stochastic algorithms which can learn near optimal approximations of the true parameters of a \relu gate in the realizable setting. (There are improved versions of this result available in our papers https://arxiv.org/abs/2005.01699 and https://arxiv.org/abs/2005.04211 which are not included in the thesis.)
We also establish the first ever (a) mathematical control on the behaviour of noisy gradient descent on a ReLU gate and (b) proofs of convergence of stochastic and deterministic versions of the widely used adaptive gradient deep-learning algorithms, RMSProp and ADAM. This study also includes a first-of-its-kind detailed empirical study of the hyper-parameter values and neural net architectures when these modern algorithms have a significant advantage over classical acceleration based methods.
3.
Understanding The Risk Of (Stochastic) Neural Nets
We push forward the emergent technology of PAC-Bayesian bounds for the risk of stochastic neural nets to get bounds which are not only empirically smaller than contemporary theories but also demonstrate smaller rates of growth w.r.t increase in width and depth of the net in experimental tests. These critically depend on our novel theorems proving noise resilience of nets.
This work also includes an experimental investigation of the geometric properties of the path in weight space that is traced out by the net during the training. This leads us to uncover certain seemingly uniform and surprising geometric properties of this process which can potentially be leveraged into better bounds in future